EPrints Technical Mailing List Archive

Message: #05490


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] subject dataset - removing subjectid from eprint


Hi Monica

I'm not saying it would be quick, but I'd be surprised it it really took an infeasible amount of time, even on a large repository.  Loading records is fairly lightweight and trivial -- it's writing that takes time, and that would only happen for records that were changed by the script.

As you've identified, EPrints is trying to be 'clever' with the subject by searching for items at that level or below.  Now that the subject in question has been removed from the tree, this may be what's causing the problem.  Three solutions I would consider:

* Do a record by record iterative search over the repository.
* Reinstate the subject id using the subject editor, run the script, then remove it from the tree.
* Identify the eprintids of items that have that subject set using a mysql query, write them to a file, then write a script to load and modify each of those eprints.


 

Jisc

Adam Field
SHERPA services analyst developer


From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Monica Wood <monica.wood@utas.edu.au>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Thursday, 10 March 2016 23:12
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

Hi Adam,

I believe changing the search would return all the eprint items in the repository?
We have a massive repository, so I this wouldn’t be a good option.

I have now done a bulk change and set the collections metafield as empty across all thesis item types.

However to help with debugging the script, I ran it with the args:  FIELDNAME = collections and SUBJECTID = theses .  If either of these were incorrect the script would have returned an error.
I only did the dry-run to see what it would output, but it never got to the bit of the script where it printed anything out, which is why I’m assuming the search returned no results, therefore $list is empty.

As in my previous email, I stated I put the noise level up to 3 so I could find out exactly what was happening and this was the Output:

Starting EPrints Repository.
Connecting to DB ... Database execute debug: SET NAMES 'utf8'
done.
Database execute debug: 
SELECT `eprint`.`eprintid` 
FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors` 
WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid` 
AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid` 
AND `127395456subject_ancestors`.`ancestors` = 'theses' 
GROUP BY `eprint`.`eprintid`

Ending EPrints Repository.


As you can see, it’s only returning those that match the eprint_collections.collections and the subject_ancestors.subjectid.  As I had removed the node ‘theses’ from the subject tree, it’s giving back no results from this query.  

I’m wondering if something should be added to the UNLINK function in the Subject Tree, that when you remove a node for good from the subject tree than any matching metafields are also removed from the records?


Monica Wood
Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library


From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Adam Field <Adam.Field@jisc.ac.uk>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Friday, 11 March 2016 at 12:21 AM
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

I would suggest running the script over the whole repository.

Looking at John's script, change this:

my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }] );

To this:

my $list = $session->dataset('eprint')->search();

...and see what happens.

(though I agree with John that this shouldn't really make a difference).  If it doesn't work, please post exactly what you typed on the command-line to invoke the script.


 

Jisc

Adam Field
SHERPA services analyst developer


From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Wednesday, 9 March 2016 06:36
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

Interesting...
You could try adding the subject back into the tree temporarily to see if it works that way?

Using this script should cause any affected EPrints' summary pages to be regenerated - if you alter the database directly, you'd have to do this by running bin/generate_abstracts.

Cheers,
John

From:eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Monica Wood <monica.wood@utas.edu.au>
Sent: 09 March 2016 04:33:34
To: 'eprints-tech@ecs.soton.ac.uk'
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint
 
Hi John,

Thanks for linking me to this script.
I’ve had a look through it and tried it out, but it’s not working. I believe this is because I’ve already removed the node from the subject tree (Unlinked it from the tree).

Putting the noise level up on the script to 3 gives me some feedback on a query it’s doing at I believe this line?
my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }
This query is (with filename set to collections and subjectid set to theses)
Database execute debug: SELECT `eprint`.`eprintid` FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors` WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid` AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid` AND `127395456subject_ancestors`.`ancestors` = 'theses' GROUP BY `eprint`.`printed`
This is returning an empty list, as the theses subjectid no longer exists in subject_ancestors, but it does still exist in eprint_collections.
I’ll have a go at bulk changing the records from the GUI, if that doesn’t work out, I’ll do a bulk change directly in the database by removing the entries in eprint_collections that point to the theses subjectid.
Cheers,
Monica Wood

Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library

From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Reply-To: "'eprints-tech@ecs.soton.ac.uk'" <eprints-tech@ecs.soton.ac.uk>
Date: Tuesday, 8 March 2016 at 10:06 PM
To: "'eprints-tech@ecs.soton.ac.uk'" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] subject dataset - remove_field

Hi Monica,

I think your suggestion will remove the field itself, rather than a specific value stored in that field.

 

I’ve done something similar – just added it to the wiki for you:

https://wiki.eprints.org/w/Remove_subjectid_script

 

Let me know if it doesn’t work for you.

 

Cheers,

John

 

 

From:eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Monica Wood
Sent: 08 March 2016 06:17
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] subject dataset - remove_field

 

Hi there,

 

In our repository we have a root subject called ‘Collections’  Under this I have unlinked(deleted) a child of Collections. 

I now have the issue that all items that were connected to this collection still have the metadata saying so and on our summary page we display the collection an item belongs to.

So it’s now showing ‘??colllectionName??’ as a link and that link is now dead.

 

Is there a way to delete these connections without needing to do it directly through the database? 

I was wondering if the epadmin remove_field might do the job on the subject dataset?

Something like:

~/bin/epadmin remove_field repoid subject collectionid ??

 

Thanks in advanced

Monica Wood
Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library

Available Times

Tues: 9am – 5pm

Wed: 1pm – 5pm

Fri: 9am – 5pm

 



University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.


Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.

Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.